A directed network model for World-Wide Web 
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In this paper, a directed network model for world-wide web is presented. The out-degree of the 
added nodes are supposed to be scale-free and its mean value is m. This model exhibits small-world 
effect, which means the corresponding networks are of very short average distance and highly large 
clustering coefficient. More interesting, the in-degree distribution obeys the power-law form with 
the exponent j = 2 + 1/m, depending on the average out-degree. This finding is supported by the 
empirical data, which has not been emphasized by the previous studies on directed networks. 

PACS numbers: 89.75.Fb, 89.75. He, 89.65.2s 
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I. INTRODUCTION 

The last few years have burst a tremendous activity de- 
voted to the characterization and understanding of com- 
plex networkQ, H 0. Researchers described many 
real-world systems as complex networks with nodes rep- 
resenting individuals or organizations and edges mimick- 
ing the interaction among them. Commonly cited ex- 
amples include technological networks, information net- 
works, social networks and biological networks 4]. The 
results of many experiments and statistical analysis indi- 
cate that the networks in various fields have some com- 
mon characteristics. They have small average distances 
like random graphs, large clustering coefficients like reg- 
ular networks, and power-law degree distributions. The 
above characters are called the small- world effect and 
scale- free property |6(. 

Motivated by the empirical studies on various real- 
life networks, some novel network models were proposed 
recently. The first successful attempt to generate net- 
works with high clustering coefficient and small aver- 
age distance is that of Watts and Strogatz (WS model) 
The WS model starts with a ring lattice with N 
nodes wherein every node is connected to its first 2m 
neighbors. The small- world effect emerges by randomly 
rewiring each edge of the lattice with probability p such 
that self-connections and duplicate edges are excluded. 
The rewiring edges are called long-range edges which con- 
nect nodes that otherwise may be part of different neigh- 
borhoods. Recently, some authors have demonstrated 
that the small-world effect can also be produced by us- 
ing deterministic methods 0, H, 0- 

Another significant model capturing the scale-free 
property is proposed by Barabasi and Albert (BA net- 
work) [a,ll3- Two special features, i.e., the growth and 
preferential attachment, is investigated in the BA net- 
works for the free scaling of the Internet, WWW and sci- 
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entific co-authorship networks, etc. These points to the 
fact that many real-world networks continuously grow by 
the way that new nodes added to the network, and would 
like to connect to the existing nodes with large number 
of neighbors. 

While the BA model captures the basic mechanism 
which is responsible for the power-law distribution, it is 
still a minimal model with several limitations: it only 
predicts a fixed exponent in a power-law degree distri- 
bution, and the clustering coefficients of BA networks 
is very small and decrease with the increasing of net- 
work size, following approximately C ~ ln 2 N/N^. To 
further understand various microscopic evolution mecha- 
nisms and overcome the BA model's discrepancies, there 
have been several promising attempts. For example, the 
aging effect on nodes' charms leads the studies on the 
aging models [H H El G3> 

the geometrical effect 
on the appearance probability of edges leads the stud- 
ies on the networks in Euclidean 

spaceplEiEII, 

and the self-similar effect on the existence of hierar- 
chical structures leads the studies on the hierarchical 
models[ll HMEIIH HI 

One of the extensively studied networks is the World- 
Wide Web[2l|2l|2i|27|, which can be treated as a di- 
rected network having power-law distributions for both 
in-degree and out-degree. In addition, it is a small-world 
networks. Since the knowledge of the evolution mecha- 
nism is very important for the better understanding of 
the dynamics built upon WWW, ma ny t heoretical mod- 
els have been constructed previously (2^ |2!| |3(|. How- 
ever, these models haven't considered the relationship 
between the in-degree distribution and the out-degree 
distribution. 

In this paper, we propose a directed network model for 
the World-Wide Web. This model displays both scale- 
free and small- world properties, and its power-law expo- 
nent of out-degree distribution is determined by the av- 
erage in-degree. Comparisons among the empirical data, 
analytic results and simulation results strongly suggest 
the present model a valid one. The rest of this paper is 
organized as follows: In section 2, the present model is 
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FIG. 1: Degree distributions for different N and m. In this 
figure, p(k) denotes the probability that a randomly selected 
node is of in-degree k. When m = 1, the power-law exponent 
7 of the density functions are 720000 = 2.95±0.06 and 780000 = 
2.97 ± 0.04. When m = 2, 720000 = 2.46 ± 0.07 and 780000 = 
2.47 ± 0.03. When m = 3, 720000 = 2.29 ± 0.08 and 780000 = 
2.31 ± 0.03. When m = 4, 720000 = 2.21 ± 0.07 and 780000 = 
2.23 ± 0.03. The four dash lines of m = 1, 2, 3, 4 have slope 
-3, -2-1/2, -2-1/3 and -2-1/4 for comparison, respectively. 



introduced. In section 3, the analyzes and simulations 
on network properties are shown, including the degree 
distribution, the average distance, and the clustering co- 
efficient. Finally, in section 4, the main conclusion is 
drawn. 



II. THE MODEL 

Our model starts with a connected graph of Nq nodes 
and mo edges. At each time step i, a new node Vi is added 
and 2e; existing nodes are chosen to be its neighbors. The 
choosing procedure involves two processes: preferential 
attachment and neighboring attachment j3J. Firstly, 
in the preferential attachment process, nodes, denoted 
by the set Qi, are selected with probability proportional 
to their in-degrees. And then, in the neighboring attach- 
ment process, for each node x 6 Qi, one of its neighbors 
is randomly selected to connect to Vi . Combine these two 
processes, there are in total 2e^ nodes having been cho- 
sen as the new one's neighbors. In the whole evolution 
processes, the self-connections and duplicute edges are 
excluded. 

It should be emphasized that, since the out-degree of 
the WWW network is not fixed but approximately obey- 
ing a power law, the number of newly added edges during 
one time step, 2e, is not a constant but a random number 
also obeying a power-law. And the average out-degree m 
is fixed, which significantly affects the in-degree distribu- 
tion exponent, average distance and clustering coefficient 
of the whole network. 



III. THE STATISTICAL CHARACTERISTICS 

In this section, the scale-free small-world characteris- 
tics of the present model are shown. 

A. The Scale-free Property 

The probability that a newly appearing node connects 
to a previous node is simply proportional to the in-degree 
k of the old vertex. Suppose the newly added node's 
attraction is A, then the probability of attachment to 
the old vertices should be proportional to k + A, where 
A is a constant and we set A = 1 for simplicity 32] . The 
probability that a new edge attaches to any of the vertices 
with degree k is 



(k + l)p k (k + l)p k 



1 



(1) 



The mean out-degree of the newly added node is simply 
m, hence the mean number of new edges to vertices with 
current in-degree k is (k+l)pkm/ (m+l) . Denote Pk, n the 
value of pk when the network size is n, then the change 
of npk is 




np Kn = 



-(fc+i)p fc ,„] 



np , 



1 - Pa, 



m+l 



k > 1 

k = 



The stationary condition pk.n+i = Pk-.n = Pk yields 



Pk 



[fcpfc_i - (k + l)pk]m/(m - 
v 1 -pam/(m + 1), 

Rearranging, one gets 

( k 



k > 1; 
k = 0. 



Pk 



k+2+l/mP k - 1 > 

(m + l)/(2m- 



1) 



k > 1; 
k = 0. 



This yields 



pk 



fc(fc-l)--! 
(fc+2+l/m)---(3+l/m)P0 

(l + l/m)5(fc + l,2 + l/m), 



(2) 



(3) 



(4) 



(5) 



where B(a,b) = r(a)r(6)/r(a + b) is Legendre's beta 
function, which goes asymptotically as a~ b for large a 
and fix 6, hence 



Pk 



-(2+l/m) 



(G) 



This leads to pk ~ fc~ 7i with 7,; = (2 + 1/m) for large TV, 
where 7$ is the exponent of the in-degree degree distri- 
bution. 

In Fig. 1, the degree distributions for m — 1,2,3,4 are 
shown. The simulation results agree with the analytic 
one very well and indicate that the exponents of the de- 
gree distribution have no relationship to the network size 
N. 
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Number of nodes 

FIG. 2: The average distance L vs network size N of the 
undirected version of the present model. One can see that L 
increases very slowly as N increases. The main plot exhibits 
the curve where L is considered as a function of lnTV, which 
is well fitted by a straight line. When m — 1, the curve is 
above the fitting line when N < 3000 and under the line when 
N > 4000. When m — 2,3, 4, the curve is under the line when 
N > 200, which indicates that the increasing tendency of L 
is approximately to lniV, and in fact a little slower than lniV. 



node i to j is considered to be an bidirectional edge be- 
tween node i and j. When the node is added to the net- 
work, each node of the network according to the time is 
marked. Denote d(i,j) the distance between nodes i and 
j, the average distance with network size N is defined as 



L(N) 



2a(N) 



(7) 



N(N — 1)' 
where the total distance is: 

a(N)= Y, rffrj)- ( 8 ) 

l<i<j"<Af 

Clearly, the distance between the existing nodes will not 
increase with the network size N, thus we have 



N 



ct(N + 1) < a(N) + Y d (h N + 1). 



(9) 



Denote y — {yi,y%, ••• , yi} as the node set that the (N + 
l)th node have connected. The distance d(i, N + 1) can 
be expressed as following 

d(i,N+l) = mm{d(i, yj )\j = 1,2,.-- ,1} + 1. (10) 

Combining the results above, we have 



<r(N + 1) < a(N) + (N-l) + J2 D(i, y), 



(11) 



One of the significant empirical results on the in- and 
out-degree distributions is reported by Albert, Jeong and 
Barabdsi |33l ] . In this paper the crawl from Altavista was 
used. The appearance of the WWW from the point of 
view of Altavista is as following Q : 

• In May 1999 the Web consisted of 203 x 10 6 vertices 
and 1466 x 10 6 hyperlinks. The average in- and out- 
degree were ki n = k ut = 7.22. 

• In October 1999 there were already 271 xlO 6 ver- 
tices and 2130 x 10 6 hyperlinks. The average in- and 
out-degree were kin — k ou t — 7.85. 

The distributions were found to be of a power-law form 
with exponent 74 = 2.1 and 7 Q = 2.7, where 7 is the 
exponent of the out-degree degree distribution. When 
k ut = 7.22 and 7.85, one can obtained from 7, = 2+1/m 
that 7^ = 2.138 and 2.127 respectively, which is very close 
to 2.1, thus give a strong support to the validity of the 
present model. 



B. The Average Distance 

The average distance plays a significant role in mea- 
suring the transmission delay, thus is one of the most 
important parameters to measure the efficiency of com- 
munication network. Since the original conception of 
small-world effect is defined based on undirected net- 
works, hereinafter we only consider the undirected ver- 
sion of our model, that is, the directed edge Eij from 



where A = {1, 2, • • • , N} — {yi,y 2 , ■ ■ • , yi} is a node set 
with cardinality N—l. Consider the set y as a single node, 
then the sum X)i=A ^(*' V) can treated as the distance 
from all the nodes in A to y, thus the sum y^_ A d(i, y) 
can be expressed approximately in terms of L(N — I) 



Yd(i,y)*(N-l)L(N-l). 



(12) 



Because the average distance L(N) 
monotonously with N , this yields 



increases 



(N-l)L(N-l) = (N-l)- 



2cr(A - I) 



< 



2a(N) 
N-l-1' 
(13) 



(N -l)(N -l-l) 
Then we can obtain the inequality 

a(N + 1) < a(N) + (N - I) + j^^- (14) 

Enlarge &(N), then the upper bound of the increasing 
tendency of cr(N) will be obtained by the following equa- 
tion. 



da(N) =N _ l+ 2(7 (N) 



(15) 



dN N-l-1 
This leads to the following solution: 

a(N) = {N-l-l) 2 log(N-l-l)-(N-l-l)+C\(N-l-l). 

From Eq.Q, we have that a(N) - N 2 L(N), thus 
L(N) ~ InA. Since Ea. (|14fl is an inequality, the precise 
increasing tendency of the average distance L(N) may 
be a little slower than In AT. The simulation results are 
reported in figure 2. 
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FIG. 3: Degree distribution of the undirected versions of 
the present model. At each time step, the new node se- 
lects m — 1,2,3,4 edges to connected, respectively. When 
m — 1, 2, 3, 4, the power-law exponent 7 of the density func- 
tions are 71,80000 = 2.95 ± 0.06, 72,80000 = 2.97 ± 0.05, 
73,80000 = 2.96 ± 0.07 and 74,80000 = 2.96 ± 0.06, respectively. 
The dash line have slope -3.0 for comparison. 



C. The Clustering Coefficient 

The clustering coefficient is defined as C = J2iLi jf> 
where 



2E(i) 
h(h - 1) 



(17) 



is the local clustering coefficient of node i, and E(i) is 
the number of edges among the neighboring set of node i. 
Approximately, when the node i is added to the network, 
it is of degree 2e^ and E(i) ss e; if the network is sparse 
enough. And under the sparse case, if a new node is 
added as i's neighbor, E(i) will increase by 1. Therefore, 
in terms of ki the expression of E(i) can be written as 
following: 



E(i) = a + (ki - 2ei) — h - e l 
Hence, we have 



C, 



kiihi — 1) 



(18) 



(19) 



This expression indicates that the local clustering scales 
as C(k) ~ fe — 1 . It is interesting that a similar scaling 
has been observed in pseudofractal web |19| and several 
real-life networks I n figure 4, we report the simu- 

lation result about the relationship between C(k) and k, 
which is in good accordance with both the analytic and 
empirical data [Is| . 
Consequently, we have 



2 A k 



ly 

N ^ 



^ ki(ki 1) 



(20) 



where k ln denotes the in-degree of the ith node. Because 
the average out-degree is to, one can replace the out- 
degree of each node by m. From Fig. 3, one can get 
that the degree distribution of the undirected network is 
p(k) ~ fc -3 , where k = fc min , fc min + 1, • • • , k max . As an 
example, the clustering coefficient C when to = 1 can be 
rewritten as 



2 N 
i=i 



1 



(21) 



Since the degree distribution is p(k) = cifc -3 , where 
k = 2, 3, • • • , fc max . The clustering coefficient C can be 
rewritten as 



C = 



E 

fc=2 



2 Np(k) 
N k 



= 2ci 



fc = 2 



For sufficient large N, k max ^> 2. 
satisfies the normalization equation 

fcmax 

y p(k)dk = 1. 

fc=2 



(22) 

The parameter ci 
(23) 
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FIG. 4: Dependence between the clustering coefficient and 
the degree k when N = 2000. One can see that the clustering 
coefficient and the degree k follow the reciprocal law. 



be obtained that c\ = 4.9491 and C = 2 x 
J2k=T — 0.8149. The demonstration ex- 



It can 
4.9491 x ^ k=2 
hibits that most real-life networks have large clustering 
coefficients no matter how many nodes they have. From 
Fig. 5, one can get that as the average out-degree in- 
creases, the clustering coefficient decreases dramatically, 
which indicates that the clustering coefficient C is rele- 
vant to the average out-degree to. 



IV. CONCLUSION AND DISCUSSION 

In summary, we have constructed a directed network 
model for World-Wide Web. The presented networks are 
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FIG. 5: The clustering coefficient vs the network size TV to 
different m of the undirected versions of the present model. In 
this figure, when m = 1, 2, 3, 4, one can find that the cluster- 
ing coefficient of the network is almost a constant 0.74, 0.28, 
0.18 and 0.14, respectively. This indicates that the average 
clustering coefficient is relevant to the average out-degree m. 



both of very large clustering coefficient and very small 
average distance. We argue that the degree distribution 
of many real-life directed networks may be fitted appro- 
priately by two power-law distributions, i.e., in- and out- 
degree power-law distributions, such as the citation net- 
work, Internet network and World-Wide Web. Both the 
analytic and numerical studies indicate the exponent of 
the in-degree distribution of the presented networks can 
be well fitted by 2 + 1/m, which has been observed in 
the empirical data. Although this model is simple and 
rough, it offers a good starting point to explain the exist- 
ing empirical data and the relationship between the in- 
and out-degree distribution exponents. 
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